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Abstract 

Protein structure is commonly regarded to be conserved and to dictate function. Most proteins rely on conformational flexibility to 
some degree. Are regions that convey conformational flexibility conserved over evolutionary time? Can changes in conformational 
flexibility alter protein function? Here, the evolutionary dynamics of structurally ordered and disordered (flexible) regions are inves- 
tigated genome-wide in flaviviruses, revealing that the amount and location of structural disorder fluctuates highly among related 
proteins. Some regionsareproneto shift between structured and flexible states. Increased evolutionary dynamics of structural disorder 
is observed for some lineages but not in others. Lineage-specific transitions of this kind could alter the conformational ensemble 
accessible to the same protein in different species, causing a functional change, even if the predominant function remains conserved. 
Thus, rapid evolutionary dynamics of structural disorder is a potential driving force for phenotypic divergence among flaviviruses. 
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Introduction 

Two central tenets of molecular biology are that protein struc- 
ture is more conserved than protein sequence and protein 
structure is crucial for protein function. However, it is impor- 
tant to note that proteins are dynamic and found as confor- 
mational ensembles to various extents (Gunasekaran et al. 
2004). A large portion of any proteome is occupied by bio- 
logically active proteins without unique 3D structures (Romero 
et al. 1998; Dunker et al. 2000; Uversky et al. 2000; Ward 
et al. 2004; Xue et al. 201 2). These proteins possess numerous 
intriguing properties, are intimately involved in various cellular 
processes (Wright and Dyson 1999; Dunker et al. 2001; 
Dunker et al. 2002; lakoucheva et al. 2002; Tompa 2002; 
Uversky 2002; Dunker et al. 2005; Dyson and Wright 2005; 
Uversky et al. 2005; Vucetic et al. 2007; Xie et al. 2007; Kim 
et al. 2008; Oldfield et al. 2008; Liu et al. 2009; Wright and 
Dyson 2009), and are commonly found to be related to the 



pathogenesis of various diseases (Uversky et al. 2008). 
Frequently involved in complex protein-protein, protein-nu- 
cleic acid, and protein-small molecule interactions, some of 
these interactions can induce a disorder-to-order transition in 
the entire protein or in its part (Wright and Dyson 1999; 
Uversky et al. 2000; Dunker et al. 2001; Tompa 2002; 
Uversky 2002; Dyson and Wright 2005; Oldfield et al. 2005; 
Mohan et al. 2006; Vacic et al. 2007; Dosztanyi et al. 2009; 
Meszaros et al. 2009; Wright and Dyson 2009; Uversky 2011). 
Furthermore, confomationally flexible proteins opens a unique 
capability for one protein to be involved in interaction with 
several unrelated binding partners and to gain different bound 
structures (Oldfield et al. 2008; Hsu et al. 2012). This means 
that the same sequence can adopt multiple conformations. 
Thus, proteins that are found as conformational ensembles 
may have more than one functional conformation and multi- 
ple functions. 
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The equilibrium of a protein's conformational ensemble is 
controlled by various external signals, such as pH, tempera- 
ture, or binding partners (del Sol et al. 2009). From an evolu- 
tionary viewpoint, sequences that adopt a conformational 
ensemble can propagate biological divergence through mul- 
tifaceted selective pressures. Sequence changes altering the 
equilibrium of the conformational ensemble, and causing a 
loss of a subset of promiscuous functions, may be better tol- 
erated than sequence changes that cause a protein to lose its 
entire function due to misfolding. On evolutionary time scales, 
changes in the equilibrium of a conformational ensemble may 
result in highly different conformational ensembles among 
homologous proteins (Siltberg-Liberles et al. 201 1). 

Proteins that have intrinsically disordered regions are pre- 
sent as conformational ensembles. Structurally disordered re- 
gions act as dynamic switches (Smock and Gierasch 2009). 
These proteins have (large or small) regions that rapidly sample 
multiple conformations due to a shallow rugged energy land- 
scape (Tsai et al. 2001). As conformations become more or- 
dered or stabilized in response to an external signal, the 
conformational ensemble endures a population shift (Ma 
et al. 1999). In accordance with the extended conformational 
selection processes, binding events ranging from lock and key 
to induced fit are plausible (Csermely et al. 2010). Intuitively, 
one can hypothesize that the conformational ensemble en- 
dures population shifts in response to sequence divergence. 
Consequently, this will ultimately impact processes of confor- 
mational selection in response to different stimuli in a 
lineage-specific manner, as a mutation-driven conformational 
selection (Tokuriki and Tawfik 2009; Siltberg-Liberles et al. 

2011) . 

We hypothesize that subtle changes in conformational flex- 
ibility can drive biological divergence among proteins that 
seem structurally and functionally conserved, such as ortholo- 
gous proteins in closely related species. To test this hypothesis, 
we have investigated how structural disorder, as an approxi- 
mation for conformational flexibility, changes among Dengue 
viruses and other flaviviruses, e.g., yellow fever virus, Japanese 
encephalitis virus, and West Nile virus. According to the World 
Health Organization, the mosquito-borne Dengue virus alone 
infects 50 million people worldwide per year, resulting in 
22,000 fatalities. While vaccines are present for some flavi- 
viruses (e.g., yellow fever virus, Japanese encephalitis virus, 
and tick-borne encephalitis virus), Dengue virus vaccines 
have proven a challenge due to the presence of different se- 
rotypes and complex antibody binding affinities (reviewed in 
Heinz and Stiasny (2012)). Importantly, flaviviruses depend on 
conformational flexibility for their life cycle (Heinz and Stiasny 

201 2) and encode a small RNA genome, which is expressed as 
a polyprotein (approximately 3,400 residues in Dengue virus) 
containing 1 1 separate protein chains. Therefore, the flavi- 
viruses were subjected to a genome-wide investigation of 
the evolutionary dynamics of structural disorder with implica- 
tions for phenotypic divergence. 



Methods 

Polyprotein Composition 

The 1 1 proteins encoded by the polyprotein in DENV-1 are 
capsid protein (C), membrane glycoprotein precursor (Mp), 
envelope protein (E), nonstructural protein 1 (NS1), nonstruc- 
tural protein 2A (NS2A), nonstructural protein 2B (NS2B), non- 
structural protein 3 (NS3), nonstructural protein 4A (NS4A), 
2K protein (2K), nonstructural protein 4B (NS4B), and non- 
structural protein 5 (NS5). Protein 2K is only 23 residues 
long and therefore excluded from this study. 

Protein Phylogenies 

NCBI BLAST (Altschul et al. 1990) was performed for each 
protein chain in the Dengue 1 virus' polyprotein independently 
against the genus flavivirus in the refseq and nr databases. 
Sequences (supplementary table S1, Supplementary Material 
online) were aligned with MAFFT (Katoh et al. 2002). Model 
testing was performed using ProtTest (Darriba et al. 2011). 
The best models (supplementary table S1, Supplementary 
Material online) from ProtTest were used to build the protein 
trees using PhyML (Guindon et al. 2005, 2009) with 1,000 
bootstraps. 

Evolutionary Amino Acid Substitution Rate per Site 

For each site in the different alignments, the evolutionary rate 
of amino acid substitutions was calculated using MEGA5 
(Tamura et al. 201 1) based on the PhyML trees. Mean (rela- 
tive) evolutionary rate are scaled such that the average evolu- 
tionary rate across all sites is 1 . This means that sites showing a 
rate <1 are evolving slower than average, and those with a 
rate > 1 are evolving faster than average. These relative rates 
were estimated under the Jones-Taylor-Thornton (1992) 
model (Jones et al. 1992) including a discrete five-category 
Gamma distribution. 

Prediction of Structural Disorder 

For every sequence in the multiple sequence alignments 
(MSAs) used for protein phylogenies, structural disorder was 
predicted using ILJPred (Dosztanyi et al. 2005) and PONDR-FIT 
(Xue, Dunbrack, et al. 2010). All predictions were run on un- 
aligned, ungapped sequences. 

Analysis 

To analyze the evolutionary dynamics of structural disorder, 
this feature must be analyzed in an evolutionary context. For 
every protein phylogeny, the value of structural disorder pre- 
dicted for every residue in all sequences in a phylogeny was 
projected onto the MSAs, in order to line up comparable sites. 
These were visualized in heatmaps using iTOL (Letunic and 
Bork 2007, 201 1). For estimating evolutionary dynamics, the 
structural disorder predictions were reduced to two states, 
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ordered and disordered. All sites with lUPred prediction values 
<0.4 were assigned order and all sites >0.4 were assigned 
disorder. Similarly, for PONDR-FIT, 0.5 was used as the cutoff 
for order and disorder. For every protein phylogeny and site in 
the corresponding MSA, the change of state (order vs. disor- 
der) was evaluated using parsimony as implemented in 
GLOOME (Cohen et al. 2010) and normalized by the 
number of nodes in each phylogeny. If a gap was present 
for a species at a specific position, that species was excluded 
for the analysis of that site. Thus, all sites with data were 
analyzed. Furthermore, to visualize the amount of disorder 
to order transition (DOT) in a phylogenic context, all branches 
with DOTs in at least 5% of the sites per row length of the 
MSA were identified as branches showing rapid evolutionary 
dynamics of DOT. 

Results 

To investigate the evolutionary dynamics of conformational 
flexibility in proteins encoded by Dengue virus and other fla- 
viviruses, protein phylogenies were constructed for all homo- 
logs of the 1 1 proteins in Dengue virus serotype 1 (DENV-1). 
Protein structure disorder was predicted as a proxy for con- 
formational flexibility using lUPred (Dosztanyi et al. 2005) and 
PONDR-FIT (Xue, Dunbrack, et al. 2010). While the lUPred 
predictions are continuous, the cutoff of 0.4 was found to 
match the disordered regions in the Disprot database of dis- 
ordered proteins (Fuxreiter et al. 2007; Sickmeier et al. 2007). 
Here, a cutoff of 0.4 was used to establish disorder or 
order. For PONDR-FIT, the boundary for order and disorder 
is 0.5 (Xue, Dunbrack, et al. 2010), and thus, the 0.5 cutoff 
was used to denote order and disorder. The two discrete 
states (disorder and order) for the two different prediction 
methods were analyzed in a phylogenetic context using 
parsimony. 

Phylogenies 

The phylogeny for the envelope protein (fig. 1) was rooted 
with Tamana bat virus (TABV) as the outgroup, in accordance 
with it being a remote flavivirus species (de Lamballerie et al. 
2002). After TABV, the Kamiti River virus (KRV) clade, 
including KRV, Cell Fusing Agent virus (CFAV), Culex flavivirus 
(CxFV), Aedes flavivirus (AeFV), branches off. Quang Binh virus 
(QBV) is frequently found in this clade too. At the next node 
junction, the upper branch further splits into two main clades. 
The upper main clade of the upper branch has the Rio Bravo 
virus (RBV) clade (Montana myotis leukoencephalitis virus 
[MMLV], Modoc virus [MODV], Apoi virus [APOIV], and RBV) 
and the Tick-borne encephalitis virus (TBEV) clade (Louping ill 
virus [LIV], Omsk hemorrhagic fever virus [OHFV], Langat virus 
[LGTV], Alkhurma hemorrhagic fever virus [AHFV], Karshi virus 
[KARV], Powasson virus [POWV], and TBEV). On the lower 
main clade of the upper branch, the yellow fever virus 
(YFV) clade (Sepik virus [SEPV], Wesselbron virus [WESSV], 
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Fig. 1. — The envelope protein phylogeny. The phylogeny of the en- 
velope protein represents the evolutionary relationships between the dif- 
ferent flaviviruses. The different clades are highlighted and the clade 
identifier is shown in italic. In most phylogenies for the separate protein 
chains, the clades are reconstructed, with the exception of the ZIKV clade 
and to a lesser extent the YFV clade. This phylogeny is consistent with a 
recent flavivirus phylogeny (Lobo et al. 2009), where the following hosts 
were indicated: 1 no known arthropod vector virus, 2 insect only virus, 
3 tick-borne virus, and 4 mosquito-borne virus. 



and YFV) and the Entebbe bat virus (ENTV) clade (Yokose 
virus [YOKV] and ENTV) are found. On the lower branch, 
the upper Dengue virus (DENV) clade consists of the four 
different serotypes of DENV (DENV-1-4). 

The lower clade of the lower branch has a monophyletic 
West Nile virus (WNV) clade (Bagaza virus [BAGV], llheus virus 
[ILHV], St. Louis Encephalitis virus [SLEV], West nile virus p 
[WNVp], Murray Valley Encephalitis virus [MVEV], Usutu 
virus [USUV], Japanese Encephalitis virus [JEV], and WNV). 
Kedougou virus (KEDV), Zika virus (ZIKV), Kokobera virus 
(KOKV) and Aroa virus (AROAV) are rarely found in consistent 
clades but tend to end up close to the DENV and the WNV 
clades. 

Full-length homologs for all DENV-1 proteins are only 
found in the ZIKV and WNV clades. Only E, NS3, and NS5 
are found across all flavivirus clades in this study. The final 
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Fig. 2. — Disorder content per protein per species. The percentage of sites per protein chain per virus that are predicted to be structurally disordered. The 
different viruses occur in the graph in the order given below the graph from left to right. 



phylogenies for all protein families follow similar clade topol- 
ogies as in the envelope protein. There are small variations 
within the clades, mostly due to the different species compo- 
sition for the different proteins (supplementary fig. S1, 
Supplementary Material online). 

Disorder Prediction 

Structural disorder was predicted for all proteins using ILJPred 
(Dosztanyi et al. 2005) and PONDR-FIT (Xue, Dunbrack, et al. 
2010). Comparing the disorder predictions from lUPred (sup- 
plementary fig. S2, Supplementary Material online) and 
PONDR-FIT (supplementary fig. S3, Supplementary Material 
online) reveals similar results, but PONDR-FIT overpredicts dis- 
order in the N- and C-termini (supplementary fig. S4, 
Supplementary Material online). For the clarity of this study, 
we will focus on the lUPred results. 

The percentage of disordered sites per protein and the 
distribution of proteins in the different viruses show high 
variation (fig. 2). NS2A and NS4A show very little disorder 
and therefore less emphasis will be put on these two proteins. 
Most viruses have 20-30% of the residues in C in the 
disordered state, but in ZIKV only 1 % of the sites are predicted 
to be disordered. In Mp, the percentages of disordered sites 
are in the range of 0-31%. The envelope protein, E, is 14% 
disordered intheoutgroupTABV, but the first clades to branch 
off, KRV and RBV, are much less disordered than the younger 
clades (DENV, ZIKV, and WNV). NS1, NS3, and NS5 have the 
highest amount of disordered sites, with NS3 being more dis- 
ordered than NS5, and NS1 is the least disordered. Most NS2B 
and NS4B proteins have little disorder. In addition, even if the 
percentage of disorder is the same, it cannot be assumed that 
the sites displaying disorder are conserved. Subsequently, the 
disorder predictions were mapped onto the MSAs used for the 
phylogeny reconstructions and visualized as heatmaps follow- 
ing the same order as the sequences in the specific phylogeny 
for each protein (fig. 3A). While the heatmap shows the 



variation in disorder prediction for all sites in the alignment, 
it does not quantify how disorder-order states actually change 
over the phylogeny. Thus, the disorder predictions for each site 
in the alignment were discretized into two states (ordered or 
disordered). The discrete states were analyzed site by site 
across the phylogeny, allowing us to capture two measures 
of evolutionary dynamics: 1) DOT per site per node (fig. 3B) 
and 2) DOTs per branch (fig. 4 and supplementary fig. S2, 
Supplementary Material online) for each protein phylogeny 
and heatmap pair. 

DOTs per Site per Node 

All proteins showing disordered sites display rapid evolutionary 
dynamics; the DOTs are frequent (fig. 3B). Very few sites are 
disordered across all taxa, while the ordered sites show much 
higher conservation. All proteins analyzed here have ordered, 
highly conserved regions, but most also have disordered re- 
gions that are showing fast evolutionary dynamics. Some dis- 
ordered regions appear to expand from one clade to another. 
Consequently, the conformational ensembles may be chang- 
ing quickly between closely related orthologs. This pattern also 
allows for the conserved, predominant functional conforma- 
tions to form, but with differences in response to various 
stimuli. 

To evaluate whether the sites that show rapid evolutionary 
dynamics of DOT also are showing high evolutionary rate of 
amino acid substitutions, we compared the DOT per site 
versus evolutionary rate per site (fig. 5). Sites with rapid DOT 
do not necessarily have an elevated amino acid substitution 
rate, and there is no clear correlation between DOT and evo- 
lutionary rate of amino acid substitution. 

DOTs per Branch 

Observing the branches where at least 5% of the sites in the 
alignment are changing states (fig. 4), it appears that DOTs are 
more frequent among the DENV, ZIKV, and WNV clades than 
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Fig. 3. — Changes in disorder across sites. (A) The combined output for 
the capsid is shown. The heatmap represent order (blue to white) and 
disorder (white to red) prediction by lUPred per site in the multiple se- 
quence alignment. The phylogeny is shown on the right. The parsimony 
analysis of DOT across the phylogentic tree is shown on top of the heat 
map as changes per site per node. (6) The parsimony analysis of DOT over 
the phylogentic tree as changes per site per node. (Combined outputs for 
all proteins and details about each phylogenetic reconstruction, see 
Supplementary table SI and Supplementary figs. S1 and S2, 
Supplementary Material online.) 



among the KRV, APOI, and YFV clades. Protein E, NS3, and 
NS5, are especially enriched in branches with rapid evolution- 
ary dynamics of DOTs. 

Discussion 

Structurally ordered regions show a higher degree of conser- 
vation than structurally disordered regions, and many regions 
prone to be structurally disordered can rapidly transition be- 
tween order and disorder. While conserved structurally or- 
dered regions imply structural conservation, conserved 
structurally disordered regions can hide changes in conforma- 
tional properties such as secondary structure propensity 
(Siltberg-Liberles 2011). Comparing the percentage of aver- 
age structural disorder per protein in orthologous proteins 
from different viruses reveals high variation levels among 
more distantly related clades. Conversely, within clades and 
between more closely related clades, there is less variation of 
the average structural disorder per protein. These results imply 
that structural disorder is changing among the sequences 
under investigation here. Indeed, the parsimony analysis of 
how structural disorder and order vary at the same site for 
different species across the alignment in the phylogenetic con- 
text confirms the fluctuation between disorder and order. The 
parsimony analysis identifies sites and regions that are rapidly 
changing between order and disorder as well as constantly 
ordered regions. The parsimony analysis across the phylogeny 
illustrating where the different changes occur provides a 
lineage-specific perspective. It reveals that, among these fla- 
viviruses, the changes per lineage are not evenly distributed 
nor does it seem to correlate with branch lengths. Some lin- 
eages are undergoing far more DOT than others. High 
lineage-specific DOT is likely to be of biological importance 
as a route to lineage-specific specialization. Subtle or not, 
changes in the conformational flexibility can alter the confor- 
mational ensemble. Regions that have experienced high DOT 
are likely to be less important to the primary function of the 
protein, but they may be important in providing a diverse set 
of promiscuous functions (such as interactions and regula- 
tion). These promiscuous functions are likely to change as 
DOTs occur. 

Here, we have compared a broad set of flavivirus prote- 
omes, but the DENV, ZIKV, and WNV clades have the most 
similar proteomes. These viruses are phenotypically different, 
governed only by changes in orthologous sequences. These 
three clades have experienced frequent lineage-specific DOT 
and while the primary functions are likely to remain, the set of 
promiscuous functions and interactions might have changed 
among these proteins. Direct evidence to these points is hard 
to derive but a few circumstantial indications add credibility of 
these points. First, antibody dependent enhancement is the 
proposed mechanism for increased severity in secondary 
Dengue virus infections. Antibodies targeted for the envelope 
protein are supposed to neutralize the virion by preventing it 
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Evolutionary rate per site 

Fig. 5. — DOT versus evolutionary rate of amino acid substitutions. Plots showing the disorder to order transitions versus the evolutionary rate of amino 
acid substitutions: (A) capsid, (B) membrane glycoprotein precursor, (0 envelope, (D) NS1, (£) NS2A, (F) NS2A, (G) NS4A, (H) NS4B, (/) NS3, and (!) NS5. 



from binding to Fc receptors. If the affinity of the antibody 
from a primary infection is too low (or is present in too low 
concentration) for a different serotype of Dengue virus, the 
virion is not neutralized and the secondary infection is en- 
hanced (Guzman and Vazquez 2010; Heinz and Stiasny 
2012). The DENV clade is enriched in lineage-specific DOT. 

Second, we note that especially NS3 and NS5 have under- 
gone high lineage-specific DOT. In a recent study, NS3 and 
NS5 from six different flaviviruses (DENV1, AHFV, WNV, JEV, 
TBEV, and Kunjin virus) were used to determine the human 
host-flavivirus protein-protein interaction network by com- 
paring their interactions with 120 human cellular target pro- 
teins. Most interactions between NS3 and/or NS5 from the 
different flaviviruses are species-specific. Only two of the 
human target proteins interacted with four of the six flavi- 
viruses, and 82 of the human target proteins interacted with 
only one of the six flaviviruses (Le Breton et al. 201 1). This 
supports the hypothesis that phenotypic divergence can result 
as the conformational flexibility changes, here rewiring pro- 
tein-protein interaction networks. Further support is provided 
by a recent study that found interactomes to be depleted in 
conserved interactions mediated by disordered proteins as 
compared with ordered proteins networks (Mosca et al. 
2012). 

Third, experimental structural biology suggests that the 
main conformations of the envelope protein are fairly con- 
served across many viruses and also that there are differences 
in the surface accessible loops among DENV and TBEV (Zhang 



et al. 2004). Mapping the locations of the structurally diver- 
gent surface loops onto the envelope heatmap shows that 
these locations often correspond to regions of altered disorder 
or order (supplementary fig. S5, Supplementary Material 
online). 

These are important observations suggesting that while re- 
porting merely the average structural disorder per protein or 
per proteome was sufficient as a first estimate of the preva- 
lence of structural disorder, it is not enough to infer the evo- 
lutionary importance that disordered regions may have in a 
protein or in a proteome. Here, we have shown rapid evolu- 
tionary dynamics of DOT along several branches in this part of 
the flavivirus phytogeny. This is the first comparative genomic 
study of structural DOT and it shows that regions of conserved 
order are intermixed with regions displaying rapid evolutionary 
dynamics of DOT. The major trends identified here are 1) the 
amount and location of structural disorder fluctuates among 
orthologs and 2) lineage-specific (structural and, thus, func- 
tional) dynamic fluctuations are frequent and could be a major 
driving force for phenotypic divergence. These trends are from 
a group of related viruses. It has been proposed that viruses 
use structural disorder to function, because viruses must 
quickly adapt to their changing environments and for their 
pathogenicity (Xue, Williams, et al. 201 0). Our results support 
and expand that hypothesis. The rapid change in disorder 
offers not only the possibility of antibody-dependent enhance- 
ment but also novel compositions of host target protein inter- 
actions. These trends may not apply to all proteins in a general 
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sense, but nevertheless provide a first view of how structural 
disorder contributes to biodiversity on a genome-wide level. 

Infections caused by Dengue virus are emerging. If we 
cannot prevent infections by vaccines, antivirals offer an alter- 
native strategy. The results presented here are valuable for 
identifying virus-specific functionally important regions that 
can be targeted for both vaccine and antiviral designs. 

Extended conformational selection is not only at play 
through conformational selection but also includes induced 
fit as a mechanism. It is important to note that both mecha- 
nisms depend on conformational flexibility. This work is fo- 
cused on the evolutionary dynamics of structural disorder and 
order in homologous sites and shows that in addition to the 
previously identified mechanism of adding long disordered 
segments as a mean to alter disorder content during protein 
evolution (Nido et al. 2012), both direct and indirect amino 
acid substitutions offer another mechanism for altering disor- 
der content. Thus, the results presented here support the view 
that mutation-driven extended conformational selection is a 
potential mechanism for biological divergence. 

Supplementary Material 

Supplementary figures S1-S5 and table S1 are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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